Lecture 7
expect_true()expect_true() when I write
teststestthat
functions, I’m just not used to using them and I forget what they
are)expect_true() is expecting (in a literal sense) only
one TRUEset.seed(0)
vec1 <- stats::rpois(5, lambda = 1)
set.seed(0)
vec2 <- stats::rpois(5, lambda = 1)
set.seed(1)
vec3 <- stats::rpois(5, lambda = 1)
vec1## [1] 2 0 1 1 2
## [1] 2 0 1 1 2
## [1] 0 1 1 2 0
## Error: vec1 == vec2 is not TRUE
##
## `actual`: TRUE TRUE TRUE TRUE TRUE
## `expected`: TRUE
all() (or, when you want to show two things aren’t the
same, any())Imports:
vs. Suggests:Imports: versus
Suggests: in your DESCRIPTION fileImports: will ensure the user has all these packages
installed when they load your packageSuggests: is “it’ll be nice if a user has these
packages installed, in which case, they’ll load them. But if the user
doesn’t have these packages installed, my package can still be
loaded.”k or larger (for some value of
k)?
2^30=1073741824 graphs to find if there’s a clique of size
k or larger.NAs or NaNs.One: Find how to reproduce your errors
set.seed() function.Two: Tracing your code to find the specific line that fails
median_random_rowSums <- function(mat,
trials = 1000){
p <- ncol(mat)
rowsum_vec <- sapply(1:trials, function(trial){
bool_vec <- stats::rbinom(p, size = 1, prob = 0.5)
idx <- which(bool_vec == 1)
mat_tmp <- mat[,idx]
return(rowSums(mat_tmp))
})
stats::median(rowsum_vec)
}
mat <- matrix(1:25, nrow = 5, ncol = 5)
median_random_rowSums(mat)## Error in rowSums(mat_tmp): 'x' must be an array of at least two dimensions
rowSums(mat_tmp) function, the real murder was done in
mat_tmp <- mat[,idx]. If idx were just a
vector of length one, mat_tmp would actually be a vector
(and not a matrix), which does not work with the rowSums()
function.Non-interactive:
print() statements in strategic places in your
code to figure out the status of your code at different times.median_random_rowSums <- function(mat,
trials = 1000){
p <- ncol(mat)
rowsum_vec <- sapply(1:trials, function(trial){
print(paste0("Trial: ", trial))
bool_vec <- stats::rbinom(p, size = 1, prob = 0.5)
idx <- which(bool_vec == 1)
print(idx)
mat_tmp <- mat[,idx]
vec <- rowSums(mat_tmp)
print(vec)
return(vec)
})
stats::median(rowsum_vec)
}
set.seed(0) # to reproduce my errors!
mat <- matrix(1:25, nrow = 5, ncol = 5)
median_random_rowSums(mat)## [1] "Trial: 1"
## [1] 1 4 5
## [1] 38 41 44 47 50
## [1] "Trial: 2"
## [1] 2 3 4 5
## [1] 54 58 62 66 70
## [1] "Trial: 3"
## [1] 4
## Error in rowSums(mat_tmp): 'x' must be an array of at least two dimensions
Interactive:
browser() function or set breakpoints
in RStudio.Terminal program on your MacTerminal)Windows PowerShell on your
laptop already, which is all you need.Alternatively 1:
Command Prompt is not the same thing as the
BASH. See https://attuneops.io/difference-between-cmd-vs-powershell-vs-bash/Command Prompt to
ssh into Bayes, once you’ve ssh-ed into Bayes,
your Command Prompt will behave like a Linux terminal.Alternatively 2:
Git BASH applicationcd: Change directoryls: List the files inside a foldermv: Move a file/foldercp: Copy a file/foldergit status: Look at the status of your GitHub
repository (what are files that have changed, has it been staged,
etc.)git add: Add (i.e., “stage”) a file to be
committedgit commit: Commit all the staged filesgit push: Push your repository from your current
location onto GitHub.comgit pull: Pull your repository from GitHub.com to your
current locationTerminal (for Macs) or
Windows PowerShell (for Windows):ssh [username]@bayes.biostat.washington.eduTwo ways:
Via GitHub – great for code and figures
Via scp in the Terminal (for Macs) or in
Windows PowerShell (or Command Prompt or
MobaXterm) (for Windows) – great for data and
results
For our course, we will primarily use GitHub to transfer files to/from your laptop to Bayes.
scp
(hopefully) sometime later in the courseUWBiost561 R package under the
vignettes folderTerminal (on Mac) or
Windows PowerShell (on Windows)UWBiost561 R package via the
cd commandgit status into the command line. This
should show your Git repository’s status, and it should
also show the 2 new files you’ve just addedgit add * into the command line. This is a
“lazy way” to simply add/“stage” all your files for the commit. (This is
functionally equivalent to clicking the check-box when you had added
files via R Studio)git commit -m "Pushing code for server" into
the command line. (In general, the things in "..." is your
commit message.)git push origin main. It will ask you for your
username and your GitHub PAT (this is the super-secret
password you had made in HW1. It’s the one that starts with
ghp_…)Terminal or
Windows PowerShell (see a few slides ago)https://github.com/UW561/UWBiost561)Terminal or Windows PowerShell,
type cd ~. (This makes sure you are in your home
directory.)git clone https://github.com/UW561/UWBiost561 (or whatever
your GitHub is – this will require you to enter your GitHub username and
your GitHub PAT)cd UWBiost561/ (you can
tab-complete)lscd vignettes/ and then look at the contents of folder via
lsOkay, so what exactly did you put into your UWBiost561
package that’s now on Bayes as well?
There are multiple moving parts:
.R script: demo_bayes.R is a simple R
script that computes the eigen-decomposition of a big, random
matrix.slurm script: demo_bayes.slurm is a
shell script (i.e., code that interacts with an operating system
directly) that tells Bayes how to run
demo_bayes.Rdemo_bayes.R (nothing too special)# store some useful information
date_of_run <- Sys.time()
session_info <- devtools::session_info()
set.seed(10)
# generate a random matrix
p <- 2000
mat <- matrix(rnorm(p^2), p, p)
mat <- mat + t(mat)
# print out some elements of the matrix
print(mat[1:5,1:5])
# compute eigenvalues
res <- eigen(mat)
# save the results
save(mat, res,
date_of_run, session_info,
file = "~/demo_bayes_output.RData")
print("Done! :)").slurm script?.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
R CMD BATCH is the command-line function to run a
.R file--no-save --no-restore are optional. (It
makes your life just a bit easier, so might as well keep them
around.)demo_bayes.R, which is the
.R file you wish to rundemo_bayes.slurm) is telling Bayes:
“Hey, I wish to run the script demo_bayes.R”.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
--job-name and the value I’m setting it to
is demo.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
--account and the value I’m setting it to
is biostat.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
--partition and the value I’m setting it to
is students-12c128g.slurm script look like?#!/bin/bash
#SBATCH --job-name=demo
#SBATCH --account=biostat
#SBATCH --partition=students-12c128g
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=10gb
R CMD BATCH --no-save --no-restore demo_bayes.R
R CMD BATCH line, this is the next most
important line in the entire SLURM script--time and --mem-per-cpu,
and the values I’m setting them to are 12:00:00 and
10gbvignettes/ on Bayes (using
cd)ls. You
should see demo_bayes.R and demo_bayes.slurm
(along all your other HW vignettes)sbatch demo_bayes.slurm.
You should see Submitted batch job ...squeue --me. This will show you which jobs
you are running and their statusvignettes/
directory, you’ll see demo_bayes.Rout (and a
slurm-[job ID].out – we usually don’t need to look at
latter file).cat demo_bayes.Rout into the command line
to see what the contents of this file is. It’s a text file, and it
literally prints out everything that happened in the R session that you
ran in the backgrounddemo_bayes.R via
save(..., file = "~/demo_bayes_output.RData"). It’s under
our home directory. Navigate to it via cd ~.demo_bayes_output.RData)? Bayes is a server that has R, so
you can just open up R! Type in R.Question: Why didn’t we just run our demo_bayes.R script
in this interactive R session (albeit it not having a fancy GUI like R
Studio).
Answer: More on this next week when we talk about server etiquette!!
.slurm scriptsdemo_bayes.slurm) does not technically
need to be .slurm. It also did not
technically need to be called demo_bayes.
.slurm files
are associated with which .R filesUWBiost561 R package on the server
vim. But more
importantly, it’s very easy to have conflicting versions of
UWBiost561 package if you simultaneously changes on your
local laptop and on Bayes)UWBiost561 repositorycompute_maximal_partial_clique()R folder in your UWBiost561 package